Lecture 9 Simple Random Sample and Paramter Estimation

9.1 Simple random sample

Following the principle of simple random sample, each sample point is randomly and independently selected from the same population. Therefore all sample points are independent and identically distributed random variables, often denoted as i.i.d. random variables. The mean of all sample points is the mean of i.i.d. random variables. This mean is itself also a random variable. We want to calculate the the expected value and variance of the sample mean. Let \(X_1, X_2,...,X_N\) be the i.i.d. sample points with \(E(X_i)=\mu\) and \(Var(X_i)=\sigma^2\), the sample mean is given by \(\overline{X} = \frac{1}{N}\sum_{i=1}^NX_i\).

Expected value of sample mean

\[E(\overline{X})=E(\frac{1}{N}\sum_{s=1}^NX_i) = \frac{1}{N}\sum_{i=1}^NE(X_i) = \frac{1}{N}\sum_{i=1}^N\mu = \mu\] Variance of sample mean

\[Var(\overline{X})=Var\Big(\frac{1}{N}\sum_{s=1}^NX_i\Big) = \frac{1}{N^2}Var\Big(\sum_{i=1}^NX_i\Big) = \frac{1}{N^2}\sum_{i=1}^NVar(X_i) = \frac{\sigma^2}{N}\]

9.1.1 Law of large numbers (LLN)

The sample mean \(\overline{X}\) has an interesting property: its expected value is \(\mu\) and its variance is \(\frac{\sigma^2}{N}\), which decreases as \(N\) increases. As \(N \to \infty\), the variance converges to 0, which implies all possible values of \(\overline{X}\) will become increasingly concentrated around the point \(\mu\). This phenomenon, where the sample mean appraoches the population mean as the sample size increases, is known as the law of large numbers (LLN). It can be denoted as: \[\overline{X} \overset{p}{\longrightarrow}\mu\] The law of large numbers (LLN) indicates that the sample average is a good estimator of the population mean. Although the sample average is still a realisation of the random variable \(\overline{X}\), the LLN tells us this realisation should be very close to \(\mu\). Moreover, as the sample size increases the sample average will get closer to \(\mu\).

9.1.2 Central limit theorem (CLT)

The law fo large numbers (LLN) tells us the convergence of the sample average \(\overline{X}\) to the population mean \(\mu\). In practice, however, we often want to assess the uncertainty of \(\overline{X}\) as an estimate of \(\mu\). If the population is normal i.e. \(X\sim N(\mu,\sigma^2)\), it can be show the sample average is also normal. Since a normal distribution is fully determined by its expected value and variance. Hence we have: \[\overline{X} \sim N(\mu,\frac{\sigma^2}{N})\] If the population is not normally distributed, the distribution of the sample average \(\overline{X}\) will still approach a normal distribution under very general conditions. According to the Central Limit Theorem (CLT), regardless of the shape of the population distribution, the distribution of the sample average \(\overline{X}\) will converge to a normal distribution as the sample size \(N\) becomes large, provided certain regularity conditions are met. Specifically, the CLT states: \[\overline{X} \overset{d}{\longrightarrow} N(\mu,\frac{\sigma^2}{N})\] The central limit theorem (CLT) helps us gauge the uncertainty of using \(\overline{X}\) as estimate for \(\mu\).

In statistics a function of sample points is called sample statistic. For example, the sample average is a sample statistic. The distribution of a sample statistic is referred to as sampling distribution. In some textbooks the distribution of \(\overline{X}\) is also referred to sampling distribution.

9.2 Parameter estimation

Estimation involves using information from sample data to make inferences about the population. Since the population is described by the probability distribution’s density, determining the parameters of this distribution is crucial for understanding the population. The primary task in inferential statistics is to estimate these population parameters based on sample information.

Principles of Estimation:

In this unit we always use sample average \(\overline{x}\) to estimate the population expected value \(\mu = E(X)\) and sample variance \(s^{2}\) to estimate the population variance \(\sigma^{2} = Var(X)\). Formally:

\(\hat{\mu}=\overline{x}\)

\(\hat{\sigma^{2}}=s^{2}\)

\(\hat{\sigma} = s\)

This way of estimation is called moment estimation in statistics.

9.2.1 Point estimation

Point estimation: providing a single value for one parameter:

We use sample average to estimate the population mean and we use the sample variance to estimate the population variance:

\(\hat{\mu}=\overline{x}\)

\(\hat{\sigma} = s\)

The “hat” over a parameter is the notation of “estimate” used by statisticians.

9.2.2 Interval estimation

How reliable is \(\overline{x}\) as an estimate for \(\mu\)?

We can use the scenario of Lotto to draw a parallel. Last week the winning number were 1,9,13,25,32,31,40. Can we use these numbers to predict the wining numbers next week? Obviously NO, because before the draw we cannot predict the winning number for sure (this is why we call it random variables). Simlarily, before the selection(draw) of the 300 employees we cannot predict with certainty that the average salary is $52.5K. So if we redo the survey the average salary will be a different number! How can we answer the question:

How reliable is 52.5 as an estimate for \(\mathbf{\mu}\)?

When we are considering the question: what will be the sample average if we redo the sampling, we are regarding the sample average no more as a constant number 52.4 but as a random variable \(\overline{X}\).

The distribution of \(\overline{X}\) and its relation to \(\mu\) can be used to answer this question. Applying rules of 68, 95 and 99.7 to \(\overline{X}\) and \(SD(\overline{X})\) we can develop an interval estimation:

\(0.95 \approx P( \mu - 2SD(\overline{X}) \le \overline{X} \le \mu + 2SD(\overline{X}))\)

\(= P\left( - \overline{X} + \mu - 2SD(\overline{X}) < 0 < \mu + {2SD(\overline{X})} - \overline{X} \right)\)

\(= P\left( - \overline{X} - {2SD(\overline{X})} < - \mu < - \overline{X} + {2SD(\overline{X})} \right)\)

\(= P\left( \overline{X} + 2SD(\overline{X}) > \mu > \overline{X} -2SD(\overline{X}) \right) \approx 0.95\)

replaceing \(SD(\overline{X})\) by \(\frac{\sigma}{\sqrt{N}}\) we have

\[ P\left( \overline{X} - 2\frac{\sigma}{\sqrt{N}} < \mu < \overline{X} + 2\frac{\sigma}{\sqrt{N}} \right) \approx 0.95\]

The last equation above states that the probability that the parameter \(\mu\) lies with the interval \(\left( \overline{X} - 2\frac{\sigma}{\sqrt{N}},\overline{X} + 2\frac{\sigma}{\sqrt{N}} \right)\) is approximately 95%. The probability is called confidence level, because we are 95% confident that \(\mu\) is contained in this interval. In the literature the standard deviation of sample average \(\frac{\sigma}{\sqrt{N}}\) is known as sample standard error.

Confidence level is denoted as \(1-\alpha\), where \(\alpha\) is the probability that the interval does not contain the parameter. As we know the rule of 68, 95 and 99.7 is approximative. To be more precise we should replace 2 by 1.96 the absolute value of 2.5th percentile of standard normal distribution \(z_\frac{\alpha}{2}=1.96\). So, we have

\(P\left( \overline{X} - z_{0.05/2}\frac{\sigma}{\sqrt{N}} < \mu < \overline{X} + z_{0.05/2}\frac{\sigma}{\sqrt{N}} \right) = 0.95 = 1 - 0.05\)

Although 1-\(\alpha=0.95\) is the most commonly used confidence level, generally \(\alpha\) can be any small probability. The general formula are:

\(\left( \overline{X} - z_{\frac{\alpha}{2}}\ \frac{\sigma}{\sqrt{N}},\overline{X} + z_{\frac{\alpha}{2}}\ \frac{\sigma}{\sqrt{N}} \right)\)

\(\left( \overline{X} - t_{\frac{\alpha}{2},N - 1}\ \frac{s}{\sqrt{N}},\overline{X} + t_{\frac{\alpha}{2},N - 1}\ \frac{s}{\sqrt{N}} \right)\)

\(\left( \overline{p} - z_{\frac{\alpha}{2}}\ \sqrt{\frac{p(1 - p)}{N}},\overline{p} + z_{\frac{\alpha}{2}}\ \sqrt{\frac{p(1 - p)}{N}} \right)\)

The second formula applies the case where \(\sigma\) is unknown. When \(\sigma\) is unknown, we replace it by the estimate \(\hat{\sigma} = s\). Because \(\hat{\sigma}\) is not exactly \(\sigma\), we need to adjust the confidence interval to make sure the confidence level remains \(1-\alpha\). To account for this, we use the value of 2.5 percentile of \(t\) distribution, denoted \(t_{\frac{\alpha}{2},N-1}\), instead the standard normal quantile \(z_{\frac{\alpha}{2}}\). \(t_{\frac{\alpha}{2},N-1}\) value is slightly larger than \(z_{\frac{\alpha}{2}}\) to account for the additional uncertainty introduced by replacing \(\sigma\) by \(\hat{\sigma}=s\). As we know the more data we have the more precise the estimate becomes. Therefore the \(t_{\frac{\alpha}{2},N-1}\) depends on the number of data \(N\). As \(N \to \infty\) \(t_{\frac{\alpha}{2},N-1} \to z_{\frac{\alpha}{2}}\).

The third formula is for estimation of a population proportion of a \((0,1)\)-valued random variable. \(X = 1\) with probability \(p\) and \(X = 0\) with probability \(1-p\). We have \[\mu= E(X) = x_1*p_1+x_2*p_2 = 1*p+0*(1-p) = p\]

\[\sigma^2=Var(X) = (x_1-E(X))^2*p_1+(x_2-E(X))^*p_2 = (1-p)^2*p+(0-p)^2*(1-p) = (1-p)*p*(1-p+p) = p(1-p)\]

Sample average

\[\overline{X} = \frac{1}{N}\sum_{i=1}^NX_i=\frac{\sum(X_i=1)}{N}=\overline{p}\] Thus the sample average is the sample proportion.

Replacing \(\overline{X}\) by \(\overline{p}\) and \(\sigma\) by \(\sqrt{p(1-p)}\) and inserting them into the first formula by the central limit theorem we have the third formula.

\(\left( \overline{p} - z_{\frac{\alpha}{2}}\ \sqrt{\frac{p(1 - p)}{N}},\overline{p} + z_{\frac{\alpha}{2}}\ \sqrt{\frac{p(1 - p)}{N}} \right)\)

Confidence interval and its relation to data, \(N\), \(\sigma\), and \(\mu\)

In the following interactive diagram you can study the influencing factor on the confidence inter by changing population parameters, number of data points and confidence level.

6.2.3 Interpretation of confidence interval

\(P\left( \overline{X} - z_{\frac{\alpha}{2}}\ \frac{\sigma}{\sqrt{N}} < \mu < \overline{X} + z_{\frac{\alpha}{2}}\ \frac{\sigma}{\sqrt{N}} \right) = 1 - \alpha\)

says if we redo the sampling many times, say 100 times, then about 95 times the confidence interval calculated using the formula discussed above, \(\left( \overline{x} - z_{\frac{\alpha}{2}}\ \frac{\sigma}{\sqrt{N}},\overline{x} + z_{\frac{\alpha}{2}}\ \frac{\sigma}{\sqrt{N}} \right)\) will contain the parameter \(\mu\).

9.3 Quality of an estimator

9.3.1 Consistence

$ is one realization of the random variable \(\overline{X}\). Why is it a good estimate of \(\mu\)?

Following the law of large numbers for i.i.d sample points we have: \[\hat{\mu}=\overline{X} \overset{p}{\longrightarrow}\mu\] The equation above says the estimator will converge to the true paramter in probability. This good property is known as consistent.

9.3.2 Unbiasedness

\[E(\hat{\mu})=E(\overline{X})=\mu\] The equation above says, the estimator gets it on average correctly. This good property is known as unbiased. Unbiasedness and consistence are two good properties of an estimator.

Demonstration of unbiasedness and consistency

In the following interactive diagram illustrate the concept of unbiasedness and consistency.

30
50

Chapter Summary

Why do we need estimation?

Point estimation: \(\widehat{\mu} = \overline{X}\) and \({\widehat{\sigma}}^{2} = S^{2}\)

Interval estimation: the rule of 95:

\(\left( \overline{X} - z_{\frac{\alpha}{2}}\ \frac{\sigma}{\sqrt{N}},\overline{X} + z_{\frac{\alpha}{2}}\ \frac{\sigma}{\sqrt{N}} \right)\) when \(\sigma\) is known

\(\left( \overline{X} - t_{\frac{\alpha}{2},N - 1}\ \frac{s}{\sqrt{N}},\overline{X} + t_{\frac{\alpha}{2},N - 1}\ \frac{s}{\sqrt{N}} \right)\) when \(\sigma\) is estimated by s

\(\left( \overline{X} - z_{\frac{\alpha}{2}}\ \sqrt{\frac{p(1 - p)}{N}},\overline{X} + z_{\frac{\alpha}{2}}\ \sqrt{\frac{p(1 - p)}{N}} \right)\)

Quality of an estimator: unbiasedness and consistence

\(E\left( \widehat{\mu} \right) = \mu\)

\(\hat{\mu}=\overline{X} \overset{p}{\longrightarrow}\mu\)

Review questions:

  1. Why do we need a point estimate?

  2. Why do we need an interval estimation?

  3. How can we use the rule of 95 to obtain a quick guess of a confidence interval?

  4. What are the two criteria used to judge the quality of an estimator?